29 research outputs found

    Iterative Embedding with Robust Correction using Feedback of Error Observed

    Get PDF
    Abstract Nonlinear dimensionality reduction techniques of today are highly sensitive to outliers. Almost all of them are spectral methods and differ from each other over their treatment of the notion of neighborhood similarities computed amongst the high-dimensional input data points. These techniques aim to preserve the notion of this similarity structure in the low-dimensional output. The presence of unwanted outliers in the data directly influences the preservation of these neighborhood similarities amongst the majority of the non-outlier data, as these points ocuring in majority need to simultaneously satisfy their neighborhood similarities they form with the outliers while also satisfying the similarity structure they form with the non-outlier data. This issue disrupts the intrinsic structure of the manifold on which the majority of the non-outlier data lies when preserved via a homeomorphism on a lowdimensional manifold. In this paper we come up with an iterative algorithm that analytically solves for a non-linear embedding with monotonic improvements after each iteration. As an application of this iterative manifold learning algorithm, we come up with a framework that decomposes the pair-wise error observed between all pairs of points and update the neighborhood similarity matrix dynamically to downplay the effect of the outliers, over the majority of the non-outlier data being embedded into a lower dimension. Preliminary work. Under review by MLIS 2015. Do not distribute

    NoPeek: Information leakage reduction to share activations in distributed deep learning

    Full text link
    For distributed machine learning with sensitive data, we demonstrate how minimizing distance correlation between raw data and intermediary representations reduces leakage of sensitive raw data patterns across client communications while maintaining model accuracy. Leakage (measured using distance correlation between input and intermediate representations) is the risk associated with the invertibility of raw data from intermediary representations. This can prevent client entities that hold sensitive data from using distributed deep learning services. We demonstrate that our method is resilient to such reconstruction attacks and is based on reduction of distance correlation between raw data and learned representations during training and inference with image datasets. We prevent such reconstruction of raw data while maintaining information required to sustain good classification accuracies
    corecore